Template-Based Information Extraction without the Templates
نویسندگان
چکیده
Standard algorithms for template-based information extraction (IE) require predefined template schemas, and often labeled data, to learn to extract their slot fillers (e.g., an embassy is the Target of a Bombing template). This paper describes an approach to template-based IE that removes this requirement and performs extraction without knowing the template structure in advance. Our algorithm instead learns the template structure automatically from raw text, inducing template schemas as sets of linked events (e.g., bombings include detonate, set off, and destroy events) associated with semantic roles. We also solve the standard IE task, using the induced syntactic patterns to extract role fillers from specific documents. We evaluate on the MUC-4 terrorism dataset and show that we induce template structure very similar to handcreated gold structure, and we extract role fillers with an F1 score of .40, approaching the performance of algorithms that require full knowledge of the templates.
منابع مشابه
Header Metadata Extraction from Semi-structured Documents Using Template Matching
With the recent proliferation of documents, automatic metadata extraction from document becomes an important task. In this paper, we propose a novel template matching based method for header metadata extraction form semi-structured documents stored in PDF. In our approach, templates are defined, and the document is considered as strings with format. Templates are used to guide finite state auto...
متن کاملCan Wavelet Denoising Improve Motor Unit Potential Template Estimation?
Background: Electromyographic (EMG) signals obtained from a contracted muscle contain valuable information on its activity and health status. Much of this information lies in motor unit potentials (MUPs) of its motor units (MUs), collected during the muscle contraction. Hence, accurate estimation of a MUP template for each MU is crucial. Objective: To investigate the possibility of improv...
متن کاملThe LOLITA User-Definable Template Interface
The development of user-definable templates interfaces which allow the user to design new templates definitions in a user-friendly way is a new issue in the field of information extraction. The LOLITA user-definable templates interface allows the user to define new templates using sentences in natural language text with a few restrictions and formal elements. This approach is rather different f...
متن کاملWeb Template Extraction Based on Hyperlink Analysis
Web templates are one of the main development resources for website engineers. Templates allow them to increase productivity by plugin content into already formatted and prepared pagelets. For the final user templates are also useful, because they provide uniformity and a common look and feel for all webpages. However, from the point of view of crawlers and indexers, templates are an important ...
متن کاملPii: S0031-3203(96)00086-6
-We propose an improved method for eye-feature extraction, descriptions, and tracking using deformable templates. Some existing algorithms are exploited to locate the initial position of eye features and then deformable templates are used for extracting and describing the eye features. Rather than using original energy minimization for matching the templates, the region-based approach is propos...
متن کامل